AAAI.2018 - Cognitive Systems | Cool Papers

#1 RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems [PDF] [Copy] [Kimi]

Authors: Chongyang Tao ; Lili Mou ; Dongyan Zhao ; Rui Yan

Open-domain human-computer conversation has been attracting increasing attention over the past few years. However, there does not exist a standard automatic evaluation metric for open-domain dialog systems; researchers usually resort to human annotation for model evaluation, which is time- and labor-intensive. In this paper, we propose RUBER, a Referenced metric and Unreferenced metric Blended Evaluation Routine, which evaluates a reply by taking into consideration both a groundtruth reply and a query (previous user-issued utterance). Our metric is learnable, but its training does not require labels of human satisfaction. Hence, RUBER is flexible and extensible to different datasets and languages. Experiments on both retrieval and generative dialog systems show that RUBER has a high correlation with human annotation, and that RUBER has fair transferability over different datasets.

#2 Expected Utility with Relative Loss Reduction: A Unifying Decision Model for Resolving Four Well-Known Paradoxes [PDF] [Copy] [Kimi]

Authors: Wenjun Ma ; Yuncheng Jiang ; Weiru Liu ; Xudong Luo ; Kevin McAreavey

Some well-known paradoxes in decision making (e.g., the Allais paradox, the St. Peterburg paradox, the Ellsberg paradox, and the Machina paradox) reveal that choices conventional expected utility theory predicts could be inconsistent with empirical observations. So, solutions to these paradoxes can help us better understand humans decision making accurately. This is also highly related to the prediction power of a decision-making model in real-world applications. Thus, various models have been proposed to address these paradoxes. However, most of them can only solve parts of the paradoxes, and for doing so some of them have to rely on the parameter tuning without proper justifications for such bounds of parameters. To this end, this paper proposes a new descriptive decision-making model, expected utility with relative loss reduction, which can exhibit the same qualitative behaviours as those observed in experiments of these paradoxes without any additional parameter setting. In particular, we introduce the concept of relative loss reduction to reflect people's tendency to prefer ensuring a sufficient minimum loss to just a maximum expected utility in decision-making under risk or ambiguity.

#3 The Structural Affinity Method for Solving the Raven's Progressive Matrices Test for Intelligence [PDF] [Copy] [Kimi]

Authors: Snejana Shegheva ; Ashok Goel

Graphical models offer techniques for capturing the structure of many problems in real-world domains and provide means for representation, interpretation, and inference. The modeling framework provides tools for discovering rules for solving problems by exploring structural relationships. We present the Structural Affinity method that uses graphical models for first learning and subsequently recognizing the pattern for solving problems on the Raven's Progressive Matrices Test of general human intelligence. Recently there has been considerable work on computational models of addressing the Raven's test using various representations ranging from fractals to symbolic structures. In contrast, our method uses Markov Random Fields parameterized by affinity factors to discover the structure in the geometric analogy problems and induce the rules of Carpenter et al.'s cognitive model of problem-solving on the Raven's Progressive Matrices Test. We provide a computational account that first learns the structure of a Raven's problem and then predicts the solution by computing the probability of the correct answer by recognizing patterns corresponding to Carpenter et al.'s rules. We demonstrate that the performance of our model on the Standard Raven Progressive Matrices is comparable with existing state of the art models.

#4 Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering [PDF] [Copy] [Kimi]

Authors: Somak Aditya ; Yezhou Yang ; Chitta Baral

Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. However, the resulting systems are generally opaque and they struggle in understanding questions for which extra knowledge is required. In this paper, we present an explicit reasoning layer on top of a set of penultimate neural network based systems. The reasoning layer enables reasoning and answering questions where additional knowledge is required, and at the same time provides an interpretable interface to the end users. Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based engine to reason over a basket of inputs: visual relations, the semantic parse of the question, and background ontological knowledge from word2vec and ConceptNet. Experimental analysis of the answers and the key evidential predicates generated on the VQA dataset validate our approach.

#5 Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory [PDF] [Copy] [Kimi]

Authors: Hao Zhou ; Minlie Huang ; Tianyang Zhang ; Xiaoyan Zhu ; Bing Liu

Perception and expression of emotion are key factors to the success of dialogue systems or conversational agents. However, this problem has not been studied in large-scale conversation generation so far. In this paper, we propose Emotional Chatting Machine (ECM) that can generate appropriate responses not only in content (relevant and grammatical) but also in emotion (emotionally consistent). To the best of our knowledge, this is the first work that addresses the emotion factor in large-scale conversation generation. ECM addresses the factor using three new mechanisms that respectively (1) models the high-level abstraction of emotion expressions by embedding emotion categories, (2) captures the change of implicit internal emotion states, and (3) uses explicit emotion expressions with an external emotion vocabulary. Experiments show that the proposed model can generate responses appropriate not only in content but also in emotion.

#6 Glass-Box Program Synthesis: A Machine Learning Approach [PDF] [Copy] [Kimi]

Authors: Konstantina Christakopoulou ; Adam Kalai

Recently proposed models which learn to write computer programs from data use either input/output examples or rich execution traces. Instead, we argue that a novel alternative is to use a glass-box scoring function, given as a program itself that can be directly inspected. Glass-box optimization covers a wide range of problems, from computing the greatest common divisor of two integers, to learning-to-learn problems. In this paper, we present an intelligent search system which learns, given the partial program and the glass-box problem, the probabilities over the space of programs. We empirically demonstrate that our informed search procedure leads to significant improvements compared to brute-force program search, both in terms of accuracy and time. For our experiments we use rich context free grammars inspired by number theory, text processing, and algebra. Our results show that (i) running our framework iteratively can considerably increase the number of problems solved, (ii) our framework can improve itself even in domain agnostic scenarios, and (iii) it can solve problems that would be otherwise too slow to solve with brute-force search.

#7 HAN: Hierarchical Association Network for Computing Semantic Relatedness [PDF] [Copy] [Kimi]

Authors: Xiaolong Gong ; Hao Xu ; Linpeng Huang

Measuring semantic relatedness between two words is a significant problem in many areas such as natural language processing. Existing approaches to the semantic relatedness problem mainly adopt the co-occurrence principle and regard two words as highly related if they appear in the same sentence frequently. However, such solutions suffer from low coverage and low precision because i) the two highly related words may not appear close to each other in the sentences, e.g., the synonyms; and ii) the co-occurrence of words may happen by chance rather than implying the closeness in their semantics. In this paper, we explore the latent semantics (i.e., concepts) of the words to identify highly related word pairs. We propose a hierarchical association network to specify the complex relationships among the words and the concepts, and quantify each relationship with appropriate measurements. Extensive experiments are conducted on real datasets and the results show that our proposed method improves correlation precision compared with the state-of-the-art approaches.

#8 Action Recognition From Skeleton Data via Analogical Generalization Over Qualitative Representations [PDF] [Copy] [Kimi]

Authors: Kezhen Chen ; Kenneth Forbus

Human action recognition remains a difficult problem for AI. Traditional machine learning techniques can have high recognition accuracy, but they are typically black boxes whose internal models are not inspectable and whose results are not explainable. This paper describes a new pipeline for recognizing human actions from skeleton data via analogical generalization. Specifically, starting with Kinect data, we segment each human action by temporal regions where the motion is qualitatively uniform, creating a sketch graph that provides a form of qualitative representation of the behavior that is easy to visualize. Models are learned from sketch graphs via analogical generalization, which are then used for classification via analogical retrieval. The retrieval process also produces links between the new example and components of the model that provide explanations. To improve recognition accuracy, we implement dynamic feature selection to pick reasonable relational features. We show the explanation advantage of our approach by example, and results on three public datasets illustrate its utility.

#9 Learning From Unannotated QA Pairs to Analogically Disambiguate and Answer Questions [PDF] [Copy] [Kimi]

Authors: Maxwell Crouse ; Clifton McFate ; Kenneth Forbus

Creating systems that can learn to answer natural language questions has been a longstanding challenge for artificial intelligence. Most prior approaches focused on producing a specialized language system for a particular domain and dataset, and they required training on a large corpus manually annotated with logical forms. This paper introduces an analogy-based approach that instead adapts an existing general purpose semantic parser to answer questions in a novel domain by jointly learning disambiguation heuristics and query construction templates from purely textual question-answer pairs. Our technique uses possible semantic interpretations of the natural language questions and answers to constrain a query-generation procedure, producing cases during training that are subsequently reused via analogical retrieval and composed to answer test questions. Bootstrapping an existing semantic parser in this way significantly reduces the number of training examples needed to accurately answer questions. We demonstrate the efficacy of our technique using the Geoquery corpus, on which it approaches state of the art performance using 10-fold cross validation, shows little decrease in performance with 2-folds, and achieves above 50% accuracy with as few as 10 examples.

#10 Style Transfer in Text: Exploration and Evaluation [PDF] [Copy] [Kimi]

Authors: Zhenxin Fu ; Xiaoye Tan ; Nanyun Peng ; Dongyan Zhao ; Rui Yan

The ability to transfer styles of texts or images, is an important measurement of the advancement of artificial intelligence (AI). However, the progress in language style transfer is lagged behind other domains, such as computer vision, mainly because of the lack of parallel data and reliable evaluation metrics. In response to the challenge of lacking parallel data, we explore learning style transfer from non-parallel data. We propose two models to achieve this goal. The key idea behind the proposed models is to learn separate content representations and style representations using adversarial networks. Considering the problem of lacking principle evaluation metrics, we propose two novel evaluation metrics that measure two aspects of style transfer: transfer strength and content preservation. We benchmark our models and the evaluation metrics on two style transfer tasks: paper-news title transfer, and positive-negative review transfer. Results show that the proposed content preservation metric is highly correlate to human judgments, and the proposed models are able to generate sentences with similar content preservation score but higher style transfer strength comparing to auto-encoder.

#11 Towards Building Large Scale Multimodal Domain-Aware Conversation Systems [PDF] [Copy] [Kimi]

Authors: Amrita Saha ; Mitesh Khapra ; Karthik Sankaranarayanan

While multimodal conversation agents are gaining importance in several domains such as retail, travel etc., deep learning research in this area has been limited primarily due to the lack of availability of large-scale, open chatlogs. To overcome this bottleneck, in this paper we introduce the task of multimodal, domain-aware conversations, and propose the MMD benchmark dataset. This dataset was gathered by working in close coordination with large number of domain experts in the retail domain. These experts suggested various conversations flows and dialog states which are typically seen in multimodal conversations in the fashion domain. Keeping these flows and states in mind, we created a dataset consisting of over 150K conversation sessions between shoppers and sales agents, with the help of in-house annotators using a semi-automated manually intense iterative process. With this dataset, we propose 5 new sub-tasks for multimodal conversations along with their evaluation methodology. We also propose two multimodal neural models in the encode-attend-decode paradigm and demonstrate their performance on two of the sub-tasks, namely text response generation and best image response selection. These experiments serve to establish baseline performance and open new research directions for each of these sub-tasks. Further, for each of the sub-tasks, we present a 'per-state evaluation' of 9 most significant dialog states, which would enable more focused research into understanding the challenges and complexities involved in each of these states.

#12 Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph [PDF] [Copy] [Kimi]

Authors: Amrita Saha ; Vardaan Pahuja ; Mitesh Khapra ; Karthik Sankaranarayanan ; Sarath Chandar

While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we introduce the task of Complex Sequential QA which combines the two tasks of (i) answering factual questions through complex inferencing over a realistic-sized KG of millions of entities, and (ii) learning to converse through a series of coherently linked QA pairs. Through a labor intensive semi-automatic process, involving in-house and crowdsourced workers, we created a dataset containing around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in our dialogs require a larger subgraph of the KG. Specifically, our dataset has questions which require logical, quantitative, and comparative reasoning as well as their combinations. This calls for models which can: (i) parse complex natural language questions, (ii) use conversation context to resolve coreferences and ellipsis in utterances, (iii) ask for clarifications for ambiguous queries, and finally (iv) retrieve relevant subgraphs of the KG to answer such questions. However, our experiments with a combination of state of the art dialog and QA models show that they clearly do not achieve the above objectives and are inadequate for dealing with such complex real world settings. We believe that this new dataset coupled with the limitations of existing models as reported in this paper should encourage further research in Complex Sequential QA.

#13 Maximizing Activity in Ising Networks via the TAP Approximation [PDF] [Copy] [Kimi]

Authors: Christopher Lynn ; Daniel Lee

A wide array of complex biological, social, and physical systems have recently been shown to be quantitatively described by Ising models, which lie at the intersection of statistical physics and machine learning. Here, we study the fundamental question of how to optimize the state of a networked Ising system given a budget of external influence. In the continuous setting where one can tune the influence applied to each node, we propose a series of approximate gradient ascent algorithms based on the Plefka expansion, which generalizes the naive mean field and TAP approximations. In the discrete setting where one chooses a small set of influential nodes, the problem is equivalent to the famous influence maximization problem in social networks with an additional stochastic noise term. In this case, we provide sufficient conditions for when the objective is submodular, allowing a greedy algorithm to achieve an approximation ratio of 1-1/e. Additionally, we compare the Ising-based algorithms with traditional influence maximization algorithms, demonstrating the practical importance of accurately modeling stochastic fluctuations in the system.